$Id: dir-spec-v1.txt 18266 2009-01-25 11:26:11Z kloesing $ Tor Protocol Specification Roger Dingledine Nick Mathewson 0. Preliminaries THIS SPECIFICATION IS OBSOLETE. This document specifies the Tor directory protocol as used in version 0.1.0.x and earlier. See dir-spec.txt for a current version. 1. Basic operation There is a small number of directory authorities, and a larger number of caches. Client and servers know public keys for the directory authorities. Tor servers periodically upload self-signed "router descriptors" to the directory authorities. Each authority publishes a self-signed "directory" (containing all the router descriptors it knows, and a statement on which are running) and a self-signed "running routers" document containing only the statement on which routers are running. All Tors periodically download these documents, downloading the directory less frequently than they do the "running routers" document. Clients preferentially download from caches rather than authorities. 1.1. Document format Router descriptors, directories, and running-routers documents all obey the following lightweight extensible information format. The highest level object is a Document, which consists of one or more Items. Every Item begins with a KeywordLine, followed by one or more Objects. A KeywordLine begins with a Keyword, optionally followed by whitespace and more non-newline characters, and ends with a newline. A Keyword is a sequence of one or more characters in the set [A-Za-z0-9-]. An Object is a block of encoded data in pseudo-Open-PGP-style armor. (cf. RFC 2440) More formally: Document ::= (Item | NL)+ Item ::= KeywordLine Object* KeywordLine ::= Keyword NL | Keyword WS ArgumentsChar+ NL Keyword = KeywordChar+ KeywordChar ::= 'A' ... 'Z' | 'a' ... 'z' | '0' ... '9' | '-' ArgumentChar ::= any printing ASCII character except NL. WS = (SP | TAB)+ Object ::= BeginLine Base-64-encoded-data EndLine BeginLine ::= "-----BEGIN " Keyword "-----" NL EndLine ::= "-----END " Keyword "-----" NL The BeginLine and EndLine of an Object must use the same keyword. When interpreting a Document, software MUST reject any document containing a KeywordLine that starts with a keyword it doesn't recognize. The "opt" keyword is reserved for non-critical future extensions. All implementations MUST ignore any item of the form "opt keyword ....." when they would not recognize "keyword ....."; and MUST treat "opt keyword ....." as synonymous with "keyword ......" when keyword is recognized. 2. Router descriptor format. Every router descriptor MUST start with a "router" Item; MUST end with a "router-signature" Item and an extra NL; and MUST contain exactly one instance of each of the following Items: "published" "onion-key" "link-key" "signing-key" "bandwidth". Additionally, a router descriptor MAY contain any number of "accept", "reject", "fingerprint", "uptime", and "opt" Items. Other than "router" and "router-signature", the items may appear in any order. The items' formats are as follows: "router" nickname address ORPort SocksPort DirPort Indicates the beginning of a router descriptor. "address" must be an IPv4 address in dotted-quad format. The last three numbers indicate the TCP ports at which this OR exposes functionality. ORPort is a port at which this OR accepts TLS connections for the main OR protocol; SocksPort is deprecated and should always be 0; and DirPort is the port at which this OR accepts directory-related HTTP connections. If any port is not supported, the value 0 is given instead of a port number. "bandwidth" bandwidth-avg bandwidth-burst bandwidth-observed Estimated bandwidth for this router, in bytes per second. The "average" bandwidth is the volume per second that the OR is willing to sustain over long periods; the "burst" bandwidth is the volume that the OR is willing to sustain in very short intervals. The "observed" value is an estimate of the capacity this server can handle. The server remembers the max bandwidth sustained output over any ten second period in the past day, and another sustained input. The "observed" value is the lesser of these two numbers. "platform" string A human-readable string describing the system on which this OR is running. This MAY include the operating system, and SHOULD include the name and version of the software implementing the Tor protocol. "published" YYYY-MM-DD HH:MM:SS The time, in GMT, when this descriptor was generated. "fingerprint" A fingerprint (a HASH_LEN-byte of asn1 encoded public key, encoded in hex, with a single space after every 4 characters) for this router's identity key. A descriptor is considered invalid (and MUST be rejected) if the fingerprint line does not match the public key. [We didn't start parsing this line until Tor 0.1.0.6-rc; it should be marked with "opt" until earlier versions of Tor are obsolete.] "hibernating" 0|1 If the value is 1, then the Tor server was hibernating when the descriptor was published, and shouldn't be used to build circuits. [We didn't start parsing this line until Tor 0.1.0.6-rc; it should be marked with "opt" until earlier versions of Tor are obsolete.] "uptime" The number of seconds that this OR process has been running. "onion-key" NL a public key in PEM format This key is used to encrypt EXTEND cells for this OR. The key MUST be accepted for at least XXXX hours after any new key is published in a subsequent descriptor. "signing-key" NL a public key in PEM format The OR's long-term identity key. "accept" exitpattern "reject" exitpattern These lines, in order, describe the rules that an OR follows when deciding whether to allow a new stream to a given address. The 'exitpattern' syntax is described below. "router-signature" NL Signature NL The "SIGNATURE" object contains a signature of the PKCS1-padded hash of the entire router descriptor, taken from the beginning of the "router" line, through the newline after the "router-signature" line. The router descriptor is invalid unless the signature is performed with the router's identity key. "contact" info NL Describes a way to contact the server's administrator, preferably including an email address and a PGP key fingerprint. "family" names NL 'Names' is a whitespace-separated list of server nicknames. If two ORs list one another in their "family" entries, then OPs should treat them as a single OR for the purpose of path selection. For example, if node A's descriptor contains "family B", and node B's descriptor contains "family A", then node A and node B should never be used on the same circuit. "read-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM,NUM,NUM... NL "write-history" YYYY-MM-DD HH:MM:SS (NSEC s) NUM,NUM,NUM,NUM,NUM... NL Declare how much bandwidth the OR has used recently. Usage is divided into intervals of NSEC seconds. The YYYY-MM-DD HH:MM:SS field defines the end of the most recent interval. The numbers are the number of bytes used in the most recent intervals, ordered from oldest to newest. [We didn't start parsing these lines until Tor 0.1.0.6-rc; they should be marked with "opt" until earlier versions of Tor are obsolete.] 2.1. Nonterminals in routerdescriptors nickname ::= between 1 and 19 alphanumeric characters, case-insensitive. exitpattern ::= addrspec ":" portspec portspec ::= "*" | port | port "-" port port ::= an integer between 1 and 65535, inclusive. addrspec ::= "*" | ip4spec | ip6spec ipv4spec ::= ip4 | ip4 "/" num_ip4_bits | ip4 "/" ip4mask ip4 ::= an IPv4 address in dotted-quad format ip4mask ::= an IPv4 mask in dotted-quad format num_ip4_bits ::= an integer between 0 and 32 ip6spec ::= ip6 | ip6 "/" num_ip6_bits ip6 ::= an IPv6 address, surrounded by square brackets. num_ip6_bits ::= an integer between 0 and 128 Ports are required; if they are not included in the router line, they must appear in the "ports" lines. 3. Directory format A Directory begins with a "signed-directory" item, followed by one each of the following, in any order: "recommended-software", "published", "router-status", "dir-signing-key". It may include any number of "opt" items. After these items, a directory includes any number of router descriptors, and a single "directory-signature" item. "signed-directory" Indicates the start of a directory. "published" YYYY-MM-DD HH:MM:SS The time at which this directory was generated and signed, in GMT. "dir-signing-key" The key used to sign this directory; see "signing-key" for format. "recommended-software" comma-separated-version-list A list of which versions of which implementations are currently believed to be secure and compatible with the network. "running-routers" whitespace-separated-list A description of which routers are currently believed to be up or down. Every entry consists of an optional "!", followed by either an OR's nickname, or "$" followed by a hexadecimal encoding of the hash of an OR's identity key. If the "!" is included, the router is believed not to be running; otherwise, it is believed to be running. If a router's nickname is given, exactly one router of that nickname will appear in the directory, and that router is "approved" by the directory server. If a hashed identity key is given, that OR is not "approved". [XXXX The 'running-routers' line is only provided for backward compatibility. New code should parse 'router-status' instead.] "router-status" whitespace-separated-list A description of which routers are currently believed to be up or down, and which are verified or unverified. Contains one entry for every router that the directory server knows. Each entry is of the format: !name=$digest [Verified router, currently not live.] name=$digest [Verified router, currently live.] !$digest [Unverified router, currently not live.] or $digest [Unverified router, currently live.] (where 'name' is the router's nickname and 'digest' is a hexadecimal encoding of the hash of the routers' identity key). When parsing this line, clients should only mark a router as 'verified' if its nickname AND digest match the one provided. "directory-signature" nickname-of-dirserver NL Signature The signature is computed by computing the digest of the directory, from the characters "signed-directory", through the newline after "directory-signature". This digest is then padded with PKCS.1, and signed with the directory server's signing key. If software encounters an unrecognized keyword in a single router descriptor, it MUST reject only that router descriptor, and continue using the others. Because this mechanism is used to add 'critical' extensions to future versions of the router descriptor format, implementation should treat it as a normal occurrence and not, for example, report it to the user as an error. [Versions of Tor prior to 0.1.1 did this.] If software encounters an unrecognized keyword in the directory header, it SHOULD reject the entire directory. 4. Network-status descriptor A "network-status" (a.k.a "running-routers") document is a truncated directory that contains only the current status of a list of nodes, not their actual descriptors. It contains exactly one of each of the following entries. "network-status" Must appear first. "published" YYYY-MM-DD HH:MM:SS (see section 3 above) "router-status" list (see section 3 above) "directory-signature" NL signature (see section 3 above) 5. Behavior of a directory server lists nodes that are connected currently speaks HTTP on a socket, spits out directory on request Directory servers listen on a certain port (the DirPort), and speak a limited version of HTTP 1.0. Clients send either GET or POST commands. The basic interactions are: "%s %s HTTP/1.0\r\nContent-Length: %lu\r\nHost: %s\r\n\r\n", command, url, content-length, host. Get "/tor/" to fetch a full directory. Get "/tor/dir.z" to fetch a compressed full directory. Get "/tor/running-routers" to fetch a network-status descriptor. Post "/tor/" to post a server descriptor, with the body of the request containing the descriptor. "host" is used to specify the address:port of the dirserver, so the request can survive going through HTTP proxies.